349 research outputs found
Robust Bayes classifiers
AbstractNaive Bayes classifiers provide an efficient and scalable approach to supervised classification problems. When some entries in the training set are missing, methods exist to learn these classifiers under some assumptions about the pattern of missing data. Unfortunately, reliable information about the pattern of missing data may be not readily available and recent experimental results show that the enforcement of an incorrect assumption about the pattern of missing data produces a dramatic decrease in accuracy of the classifier. This paper introduces a Robust Bayes Classifier (rbc) able to handle incomplete databases with no assumption about the pattern of missing data. In order to avoid assumptions, the rbc bounds all the possible probability estimates within intervals using a specialized estimation method. These intervals are then used to classify new cases by computing intervals on the posterior probability distributions over the classes given a new case and by ranking the intervals according to some criteria. We provide two scoring methods to rank intervals and a decision theoretic approach to trade off the risk of an erroneous classification and the choice of not classifying unequivocally a case. This decision theoretic approach can also be used to assess the opportunity of adopting assumptions about the pattern of missing data. The proposed approach is evaluated on twenty publicly available databases
Bayesian Clustering by Dynamics
This paper introduces a Bayesian method for clustering dynamic processes. The method models dynamics as Markov chains and then applies an agglomerative clustering procedure to discover the most probable set of clusters capturing different dynamics. To increase ef£ciency, the method uses an entropy-based heuristic search strategy. A controlled experiment suggests that the method is very accurate when applied to artificial time series in a broad range of conditions and, when applied to clustering sensor data from mobile robots, it produces clusters that are meaningful in the domain of application
Recommended from our members
Naïve Bayesian Classifier and Genetic Risk Score for Genetic Risk Prediction of a Categorical Trait: Not so Different after all!
One of the most popular modeling approaches to genetic risk prediction is to use a summary of risk alleles in the form of an unweighted or a weighted genetic risk score, with weights that relate to the odds for the phenotype in carriers of the individual alleles. Recent contributions have proposed the use of Bayesian classification rules using Naïve Bayes classifiers. We examine the relation between the two approaches for genetic risk prediction and show that the methods are mathematically related. In addition, we study the properties of the two approaches and describe how they can be generalized to include various models of inheritance
Conditional clustering of temporal expression profiles
<p>Abstract</p> <p>Background</p> <p>Many microarray experiments produce temporal profiles in different biological conditions but common cluster techniques are not able to analyze the data conditional on the biological conditions.</p> <p>Results</p> <p>This article presents a novel technique to cluster data from time course microarray experiments performed across several experimental conditions. Our algorithm uses polynomial models to describe the gene expression patterns over time, a full Bayesian approach with proper conjugate priors to make the algorithm invariant to linear transformations, and an iterative procedure to identify genes that have a common temporal expression profile across two or more experimental conditions, and genes that have a unique temporal profile in a specific condition.</p> <p>Conclusion</p> <p>We use simulated data to evaluate the effectiveness of this new algorithm in finding the correct number of clusters and in identifying genes with common and unique profiles. We also use the algorithm to characterize the response of human T cells to stimulations of antigen-receptor signaling gene expression temporal profiles measured in six different biological conditions and we identify common and unique genes. These studies suggest that the methodology proposed here is useful in identifying and distinguishing uniquely stimulated genes from commonly stimulated genes in response to variable stimuli. Software for using this clustering method is available from the project home page.</p
Robust outcome prediction for intensive care patients
Missing data are a major plague of medical databases in general, and of Intensive Care Units databases in particular. The time pressure of work in an Intensive Care Unit pushes the physicians to omit randomly
or selectively record data. These different omission strategies give rise to different patterns of missing data and the recommended approach of completing the database using median imputation and fitting a logistic
regression model can lead to significant biases. This paper applies a new classification method, called robust Bayes classifier, that does not rely on any particular assumption about the pattern of missing data and compares it to the traditional median imputation approach using a database of 324 Intensive Care Unit patients
Application of imaging techniques for the characterization of lumps behaviour in gas-solid fluidized-bed reactors
Gas-solid fluidized-bed reactors are often used in waste pyrolysis and gasification processes thanks to their
excellent mixing properties, which guarantee temperature uniformity. However, this latter property can fail
when large objects, such as lumps, are introduced or form in the system. Understanding the motion characteristics and thermal behaviour of lumps in a high temperature fluidized-bed reactor can help determining how the
presence of lumps impact reactors’ performance. This was the object of this study. In particular, this work aims to
assess how process variables and physical properties impact the segregation behaviour, dispersion coefficients
and heat transfer coefficients of these lumps during operation. The system used in this work is a down-scaled
pseudo-2D fluidized bed operated at ambient temperature and at fluidization velocities ranging between 1
Umf and 10 Umf. Rutile sand with four different mean particle sizes (60 μm, 100 μm, 153 μm and 215 μm) was
used as bed material. Fabricated lumps were introduced in the fluidized bed to reproduce realistic conditions, as
when lumps form in a high-temperature fluid bed. The density ratio between the lump and the bed material
particle was varied between 0.32 and 0.55 to account for different lump compositions. X-ray digital radiography
and infrared thermography were used respectively to track the fabricated lumps and to obtain their temperature
time evolution. The lump density was found not to have a significant effect on the lump dispersion coefficients or
on the heat transfer coefficient. Optimal values of fluidization velocities that guarantee proper lump mixing and
maximum heat transfer coefficient were obtained. This latter increases by up to 10 times if the optimal fluidization velocity is selected. An increase in the bed material particle size was found to cause an increase in the
dispersion coefficients and a decrease in the heat transfer coefficient. The trend of the heat transfer coefficient as
a function of the fluidization velocity was found to vary significantly between different bed material particle
sizes. A new correlation for the Nusselt number as a function of the object Reynolds number and of the size ratio
between lump and bed material was obtained. This correlation applies to cases where particle convection is the
dominant mechanism of heat transfer. The results of this work provide important knowledge to minimize the
impact of lumps on fluidized-bed reactors and to optimize their operation
Evolution and challenges in the design of computational systems for triage assistance
AbstractCompared with expert systems for specific disease diagnosis, knowledge-based systems to assist decision making in triage usually try to cover a much wider domain but can use a smaller set of variables due to time restrictions, many of them subjective so that accurate models are difficult to build. In this paper, we first study criteria that most affect the performance of systems for triage assistance. Such criteria include whether principled approaches from machine learning can be used to increase accuracy and robustness and to represent uncertainty, whether data and model integration can be performed or whether temporal evolution can be modeled to implement retriage or represent medication responses. Following the most important criteria, we explore current systems and identify some missing features that, if added, may yield to more accurate triage systems
- …